AITopics | document layout

Collaborating Authors

document layout

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing Document AI Data Generation Through Graph-Based Synthetic Layouts

Agarwal, Amit, Patel, Hitesh, Pattnayak, Priyaranjan, Panda, Srikant, Kumar, Bhargava, Kumar, Tejaswini

arXiv.org Artificial IntelligenceNov-27-2024

The development of robust Document AI models has been constrained by limited access to high-quality, labeled datasets, primarily due to data privacy concerns, scarcity, and the high cost of manual annotation. Traditional methods of synthetic data generation, such as text and image augmentation, have proven effective for increasing data diversity but often fail to capture the complex layout structures present in real world documents. This paper proposes a novel approach to synthetic document layout generation using Graph Neural Networks (GNNs). By representing document elements (e.g., text blocks, images, tables) as nodes in a graph and their spatial relationships as edges, GNNs are trained to generate realistic and diverse document layouts. This method leverages graph-based learning to ensure structural coherence and semantic consistency, addressing the limitations of traditional augmentation techniques. The proposed framework is evaluated on tasks such as document classification, named entity recognition (NER), and information extraction, demonstrating significant performance improvements. Furthermore, we address the computational challenges of GNN based synthetic data generation and propose solutions to mitigate domain adaptation issues between synthetic and real-world datasets. Our experimental results show that graph-augmented document layouts outperform existing augmentation techniques, offering a scalable and flexible solution for training Document AI models.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.17577/IJERTV13IS100138

2412.0359

Country:

North America > United States > New York (0.04)
North America > Dominican Republic (0.04)
Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

Luo, Chuwei, Shen, Yufan, Zhu, Zhaoqing, Zheng, Qi, Yu, Zhi, Yao, Cong

arXiv.org Artificial IntelligenceApr-8-2024

Recently, leveraging large language models (LLMs) or multimodal large language models (MLLMs) for document understanding has been proven very promising. However, previous works that employ LLMs/MLLMs for document understanding have not fully explored and utilized the document layout information, which is vital for precise document understanding. In this paper, we propose LayoutLLM, an LLM/MLLM based method for document understanding. The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts. The proposed layout instruction tuning strategy consists of two components: Layout-aware Pre-training and Layout-aware Supervised Fine-tuning. To capture the characteristics of document layout in Layout-aware Pre-training, three groups of pre-training tasks, corresponding to document-level, region-level and segment-level information, are introduced. Furthermore, a novel module called layout chain-of-thought (LayoutCoT) is devised to enable LayoutLLM to focus on regions relevant to the question and generate accurate answers. LayoutCoT is effective for boosting the performance of document understanding. Meanwhile, it brings a certain degree of interpretability, which could facilitate manual inspection and correction. Experiments on standard benchmarks show that the proposed LayoutLLM significantly outperforms existing methods that adopt open-source 7B LLMs/MLLMs for document understanding. The training data of the LayoutLLM is publicly available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/LayoutLLM

dataset, information, layoutllm, (16 more...)

arXiv.org Artificial Intelligence

2404.05225

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Enhancement of Bengali OCR by Specialized Models and Advanced Techniques for Diverse Document Types

Rabby, AKM Shahariar Azad, Ali, Hasmot, Islam, Md. Majedul, Abujar, Sheikh, Rahman, Fuad

arXiv.org Artificial IntelligenceFeb-7-2024

This research paper presents a unique Bengali OCR system with some capabilities. The system excels in reconstructing document layouts while preserving structure, alignment, and images. It incorporates advanced image and signature detection for accurate extraction. Specialized models for word segmentation cater to diverse document types, including computer-composed, letterpress, typewriter, and handwritten documents. The system handles static and dynamic handwritten inputs, recognizing various writing styles. Furthermore, it has the ability to recognize compound characters in Bengali. Extensive data collection efforts provide a diverse corpus, while advanced technical components optimize character and word recognition. Additional contributions include image, logo, signature and table recognition, perspective correction, layout reconstruction, and a queuing module for efficient and scalable processing. The system demonstrates outstanding performance in efficient and accurate text extraction and analysis.

document type, ocr system, recognition, (14 more...)

arXiv.org Artificial Intelligence

2402.05158

Country:

Asia > Singapore (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > District of Columbia > Washington (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Rethinking the Evaluation of Pre-trained Text-and-Layout Models from an Entity-Centric Perspective

Zhang, Chong, Zhao, Yixi, Yuan, Chenshu, Tu, Yi, Guo, Ya, Zhang, Qi

arXiv.org Artificial IntelligenceFeb-4-2024

Recently developed pre-trained text-and-layout models (PTLMs) have shown remarkable success in multiple information extraction tasks on visually-rich documents. However, the prevailing evaluation pipeline may not be sufficiently robust for assessing the information extraction ability of PTLMs, due to inadequate annotations within the benchmarks. Therefore, we claim the necessary standards for an ideal benchmark to evaluate the information extraction ability of PTLMs. We then introduce EC-FUNSD, an entity-centric benckmark designed for the evaluation of semantic entity recognition and entity linking on visually-rich documents. This dataset contains diverse formats of document layouts and annotations of semantic-driven entities and their relations. Moreover, this dataset disentangles the falsely coupled annotation of segment and entity that arises from the block-level annotation of FUNSD. Experiment results demonstrate that state-of-the-art PTLMs exhibit overfitting tendencies on the prevailing benchmarks, as their performance sharply decrease when the dataset bias is removed.

annotation, benchmark, dataset, (16 more...)

arXiv.org Artificial Intelligence

2402.02379

Country:

Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Add feedback

Machine Learning is the Wrong Way to Extract Data From Most Documents

#artificialintelligenceJul-27-2022, 05:39:03 GMT

Documents have spent decades stubbornly guarding their contents against software. In the late 1960s, the first OCR (optical character recognition) techniques turned scanned documents into raw text. By indexing and searching the text from these digitized documents, software sped up formerly laborious legal discovery and research projects. Today, Google, Microsoft, and Amazon provide high-quality OCR as part of their cloud services offerings. But documents remain underused in software toolchains, and valuable data languish in trillions of PDFs.

document layout, representational mode, template, (12 more...)

#artificialintelligence

Country: North America > United States (0.05)

Industry: Banking & Finance > Insurance (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.55)

Add feedback